Comments about the article in Nature: ChatGPT-is-a-black-box:-how-AI-research-can-break-it-open

Following is a discussion about this article in Nature Vol 593 25 July 2023, by Editorial
To study the full text select this link: https://www.nature.com/articles/d41586-023-02366-2 In the last paragraph I explain my own opinion.

Reflection


Introduction

“I propose to consider the question, ‘Can machines think?” So began a seminal 1950 paper by British computing and mathematics luminary Alan Turing (A. M. Turing Mind LIX, 433–460; 1950).
That requires a clear defintion of what means to think. Which is dificult.
I know that I can think. I also assume that all humans can think. But I don't assume that my PC can think, because it is an electronic device.
But as an alternative to the thorny task of defining what it means to think, Turing proposed a scenario that he called the “imitation game”. A person, called the interrogator, has text-based conversations with other people and a computer. Turing wondered whether the interrogator could reliably detect the computer — and implied that if they could not, then the computer could be presumed to be thinking. The game captured the public’s imagination and became known as the Turing test.
I know when I perform a chat with chatGPT he gives sensible answers, but still I consider ChatGPT as a program which mimics a human.
Although an enduring idea, the test has largely been considered too vague — and too focused on deception, rather than genuinely intelligent behaviour — to be a serious research tool or goal for artificial intelligence (AI). But the question of what part language can play in evaluating and creating intelligence is more relevant today than ever. That’s thanks to the explosion in the capabilities of AI systems known as large language models (LLMs), which are behind the ChatGPT chatbot, made by the firm OpenAI in San Francisco, California, and other advanced bots, such as Microsoft’s Bing Chat and Google’s Bard. As the name ‘large language model’ suggests, these tools are based purely on language.
My mind is in some sense also purely based on language. At the same time I also have eyes, which I can use to get a different type of input and this input I can directly mix with the written text I can read. That is the advantage users have, compared with the rather limitted capability of the CPU of a computer which can only handle numbers.
ChatGPT broke the Turing test — the race is on for new ways to assess AI With an eerily human, sometimes delightful knack for conversation — as well as a litany of other capabilities, including essay and poem writing, coding, passing tough exams and text summarization — these bots have triggered both excitement and fear about AI and what its rise means for humanity. But underlying these impressive achievements is a burning question: how do LLMs work?
I understand the question, but I don't understand why no body knows the answer.
As with other neural networks, many of the behaviours of LLMs emerge from a training process, rather than being specified by programmers.
I would expect that the influence of programmers is minimal.
As a result, in many cases the precise reasons why LLMs behave the way they do, as well as the mechanisms that underpin their behaviour, are not known — even to their own creators.
I expect that a lot depends about the training data. The people who are involved with ChatGPT must know that. They must know have written documents which shows how the answers can change if the training data is biased.
As Nature reports in a Feature, scientists are piecing together both LLMs’ true capabilities and the underlying mechanisms that drive them. Michael Frank, a cognitive scientist at Stanford University in California, describes the task as similar to investigating an “alien intelligence”.`
Revealing this is both urgent and important, as researchers have pointed out (S. Bubeck et al. Preprint at https://arxiv.org/abs/2303.12712; 2023). For LLMs to solve problems and increase productivity in fields such as medicine and law, people need to better understand both the successes and failures of these tools. This will require new tests that offer a more systematic assessment than those that exist today.

1. Breezing through exams

LLMs ingest enormous reams of text, which they use to learn to predict the next word in a sentence or conversation. The models adjust their outputs through trial and error, and these can be further refined by feedback from human trainers. This seemingly simple process can have powerful results. Unlike previous AI systems, which were specialized to perform one task or have one capability, LLMs breeze through exams and questions with a breadth that would have seemed unthinkable for a single system just a few years ago. But as researchers are increasingly documenting, LLMs’ capabilities can be brittle. Although GPT-4, the most advanced version of the LLM behind ChatGPT, has aced some academic and professional exam questions, even small perturbations to the way a question is phrased can throw the models off.
I can clearly imagine that this is a problem. To have a good discussion with some one all text used must be clear and simpel. Text in this context means concepts used and syntax.
For example what means: "can throw the models off"? why "models" (plural)
This lack of robustness signals a lack of reliability in the real world. Scientists are now debating what is going on under the hood of LLMs, given this mixed performance.
On one side are researchers who see glimmers of reasoning and understanding when the models succeed at some tests.
On the other are those who see their unreliability as a sign that the model is not as smart as it seems.

2. AI approvals


Reflection 1


Reflection 2


If you want to give a comment you can use the following form Comment form


Created: 20 December 2022

Back to my home page Index
Back to Nature comments Nature Index